This paper introduces XL2Bench, a benchmark to evaluate language models' ability to understand very long texts (100K+ words English, 200K+ characters Chinese). It has 3 scenarios (fiction, papers, laws) and 4 tasks (memory retrieval, detailed understanding, overall understanding, open-ended generation) across 27 subtasks. Experiments on 6 leading LLMs show performance l...